Blood leukocyte transcriptional modules and differentially expressed genes associated with disease severity and age in COVID-19 patients

Since the molecular mechanisms determining COVID-19 severity are not yet well understood, there is a demand for biomarkers derived from comparative transcriptome analyses of mild and severe cases, combined with patients’ clinico-demographic and laboratory data. Here the transcriptomic response of human leukocytes to SARS-CoV-2 infection was investigated by focusing on the differences between mild and severe cases and between age subgroups (younger and older adults). Three transcriptional modules correlated with these traits were functionally characterized, as well as 23 differentially expressed genes (DEGs) associated to disease severity. One module, correlated with severe cases and older patients, had an overrepresentation of genes involved in innate immune response and in neutrophil activation, whereas two other modules, correlated with disease severity and younger patients, harbored genes involved in the innate immune response to viral infections, and in the regulation of this response. This transcriptomic mechanism could be related to the better outcome observed in younger COVID-19 patients. The DEGs, all hyper-expressed in the group of severe cases, were mostly involved in neutrophil activation and in the p53 pathway, therefore related to inflammation and lymphopenia. These biomarkers may be useful for getting a better stratification of risk factors in COVID-19.

. Clinical and demographic characteristics of the COVID-19 hospitalized (Severe group) and oligosymptomatic (Mild group) patients. U unknown, NA not applicable, mo months, yrs years, d days. Thirteen patients in the Mild group opted for not answering the question relative to race. All statistical analyses were performed using Mann Whitney test, except for the analyses indicated by * or ** where Chi-square or Fisher exact tests were used. In bold, significant p-values < 0.05. www.nature.com/scientificreports/ 10 °C until RNA extraction. WBC were lysed with RLT buffer, and the total RNA was extracted using QIAamp ® RNA Blood Mini kit (Qiagen). RNA purity analysis and quantification were performed using the NanoVue spectrophotometer (GE Heathcare Life Sciences, Marlborough, MA). RNA quality was assessed on the Agilent BioAnalyzer 2100 (Agilent, Santa Clara, USA). All RNA samples were stored at − 80 °C until used in hybridization experiments.

Microarray hybridization.
A total of 23 RNA samples were used for gene expression analysis and grouped as Severe (n = 11) or Mild (n = 12) according to the patient's characteristics (Supplementary Table S2). These two groups showed no significant differences regarding age, sex, and time of whole blood collection after the first day of COVID-19 symptoms. The Severe group was further divided into subgroups A and B, where A included the samples from adolescents and younger adults (age ranging from 11 to 38 years; n = 5) and B included the samples from older adults (age ranging from 41 to 62 years; n = 6). Similarly, the Mild group was divided in subgroups C and D, where C included the samples from adolescents and younger adults (age ranging from 10 to 37 years; n = 7) and D included the samples from older adults (age ranging from 41 to 64 years; n = 5). The patients between zero and nine years (Severe and Mild groups) were not included due to insufficient RNA quality.
To determine gene expression profiles, 4 × 44 K DNA microarrays (Whole Human Genome Microarray Kit, Agilent Technologies, cat no. G4845A) were used. The procedures for hybridization using the fluorescent dye Cy3 followed the manufacturer's protocols (One-Color Microarray-Based Gene Expression Analysis-Quick Amp Labeling). The images were captured by the reader Agilent Bundle according to the parameters recommended for bioarrays and extracted by Agilent Feature Extraction software version 11.5.1.1 (https:// www. agile nt. com/). Spots with two or more flags (low intensity, saturation, controls, etc.) were considered as NA, that is, without valid expression value. All microarray raw data have been deposited in GEO public database (http:// www. ncbi. nlm. nih. gov/ geo), a MIAME compliant database, under accession number GSE193022.
Gene expression analysis. An in-house algorithm in R environment (version 3.6.2) 10 was used for excluding transcripts presenting one or more missing values (NAs) per group and for converting gene expression values to log base 2. Through this procedure we obtained two gene expression data matrices: (i) one for gene coexpression network (GCN) analysis, and (ii) another for differential gene expression (DEG) analysis. The GCN matrix had 6375 Gene Ontology (GO) annotated genes after excluding all NAs. The DEG matrix had 15,248 GO annotated genes, including NAs and with a minimum of four valid gene expression values per group. Boxplot analysis was used for outlier detection ( Supplementary Fig. S2). Data normalization for both data matrices was performed using limma package version 3.9 15 in R environment (version 3.6.2) 10 . The differential gene expression analyses were conducted in three comparisons separately: Weighted gene co-expression network analysis (WGCNA). The network was constructed using the WGCNA package 16 (version 1.69-81; https:// horva th. genet ics. ucla. edu/ html/ Coexp ressi onNet work/ Rpack ages/ WGCNA/) in R version 3.6.2 environment 10 . No outliers were detected in the analysis of gene expression data matrix. Pearson's correlation coefficient was used for obtaining gene co-expression similarity measures and for the subsequent construction of an adjacency matrix using soft power and topological overlap matrix (TOM). Soft thresholding process transforms the correlation matrix to mimic the scale-free topology. TOM is used to filter weak connections during network construction. Module identification is based on TOM and on average linkage hierarchical clustering. The soft power β = 20 (R 2 = 0.9160) was chosen using the scale-free topology criterion ( Supplementary Fig. S3). Finally, Dynamic Tree Cut algorithm 6 was used for dendrogram's branch selection. The module eigengene (ME) is defined as the first principal component of a given module, which can be considered a representative of the gene expression profiles in a module. Module Membership (MM), also known as eigengene-based connectivity (kME), is defined as the correlation of each gene expression profile with the module eigengene of a given module.
Module-trait association. Module-trait association analysis was accomplished using the WGCNA package (version 1. [69][70][71][72][73][74][75][76][77][78][79][80][81] 16 in R environment (version 3.6.2) 10 . For this analysis we considered as specific traits: severity group, age subgroup, sex, race, comorbidities, therapy, clinical laboratory characteristics, hospital stay, and blood collection time point after first symptoms onset. Only traits present in three or more patients were considered in the module-trait association analysis. Subsequently, the gene significance (GS), i.e., a value for the correlation between specific traits and gene expression profiles 16 was obtained. The mean GS value for a particular module is considered as a measure of module significance (MS). Modules presenting a significant p-value (p < 0.05) and a positive trait correlation were selected for functional analysis.
Node categorization. All genes belonging to a given module were designated here as ME genes (for module eigengene gene). The modules significantly correlated to specific traits (age group, clinical, and laboratorial data) were further evaluated for identifying relevant hub genes, i.e., the highly connected genes, here termed high hierarchy (HH) genes, that hold the transcriptional network together and are also associated to specific cellular processes or link different biological processes 17 . Thus, connectivity measures were used for the hierarchical categorization of hub genes, considering connectivity values related to the network (overall connectivity) and to the module (intramodular connectivity for each gene based on its Pearson's correlation with all other genes in the module).
Intramodular node connectivity was calculated considering: (i) kTotal, the whole network connectivity of each gene; (ii) kWithin, gene connections with other genes in the same module 16 . Genes presenting high kTotal and kWithin are classified as high hubs (Hhubs), genes presenting high kTotal but low kWithin are called www.nature.com/scientificreports/ eHubs, and genes presenting high kWithin but low kTotal are the iHubs. The iHubs connect most of the genes in a transcriptional module, whereas the eHubs connect different transcriptional modules, and the Hhubs hold the transcriptional modules and the network together 18 . The top 10 genes presenting the highest kTotal and/or kWithin values were selected as HH genes. All gene values were plotted in a kTotal (x-axis) vs. kWithin (y-axis) graph. Additionally, the expression profile was assessed for the selected hubs through GS values, i.e., the GS of the nth gene is the correlation measure between the nth gene expression and the specific trait. Positive or negative GS values mean that the nth gene is hyper-or hypo-expressed for the specific trait. Only GS values with p < 0.01 were considered significant for the trait.
Enrichment analyses. WGCNA modular gene set enrichment analyses for Gene Ontology Biological Process (GO BP), KEGG pathways, and Reactome pathways were accomplished by using the Enrichr online webbased tool 19 . The terms presenting adjusted p-value < 0.05 for the modules or p-value < 0.05 for the hub genes, were considered significantly enriched. The same enrichment analysis strategy was applied for all hub genes and DE genes.
Statistical analysis. Significance analysis for microarray (SAM) using MeV software (version 4.9.0) was used for differential gene expression analyses. The relative expression of the differentially expressed genes (DEGs) was normalized with the endogenous reference gene GUSB for statistical analysis (Supplementary Table S3). The t-test was used for statistical analyses of differentially expressed genes. Mann Whitney, Chi-square, and Fisher exact tests were used for statistical analysis of the clinic-demographical data. All statistical analyses were performed in GraphPad Prism (version 8). The FAMD analysis was performed as described in "Factor analysis of mixed data (FAMD)" section.  Table 1. The median age for the Severe and Mild groups was significantly different: 50 years and 34 years, respectively. There was a predominance of males in the Severe group as compared to the Mild group. There were more young male patients (0-40 years of age) in the Severe group. Significant lower median hospital stay was observed for young patients (10-40 years of age) as compared with elderly patients (Table 1). In the Severe group more patients presented fever, dyspnea, wheezing in the chest, and vomiting, while more oligosymptomatic individuals presented coryza and headache (Table 1). A significantly higher number of patients in the Severe group presented at least one comorbidity. In decreasing order of frequency: systemic arterial hypertension (SAH), obesity, diabetes mellitus (DM), chronic heart conditions, chronic lung diseases, and chronic kidney disease ( Supplementary Fig. S4).
The hemogram results were interpreted as normal, high, or low, based on reference values for age and sex (Supplementary Table S4). The comparative analysis between the Severe and Mild groups showed that more individuals in the Severe group presented high values for segmented neutrophils and low values for erythrocytes, hematocrit, hemoglobin, lymphocytes, and immature neutrophils ( Supplementary Fig. S5). Severe patients had significantly elevated values for all hemogram-derived ratios when compared to Mild (i.e., oligosymptomatic) patients (Supplementary Table S5), and higher values were found for PLR and SII parameters in the subgroups B and D (patients over 41 years of age), thus confirming the positive correlation of elevated hemogram-derived ratios with inflammation and severe COVID-19 10 . Additionally, the individuals in the Severe group expressed higher levels of IL-6 and IL-10 when compared with the Mild group. The cytokines IL-4, IL-2, and IFN-γ, were also elevated in these individuals ( Supplementary Fig. S6).
FAMD analysis for clinical characteristics yielded six dimensional clusters for patients aged 0-9, 10-40 and over 40 years, in the Severe and Mild groups (Fig. 1). All patients selected for transcriptomics studies are contained in four dimensional clusters: 10-40 and over 40 years for both Severe and Mild groups.
WGCNA and module-trait correlation analysis. The normalized gene expression data of 6375 GO annotated genes were used for network construction and module identification by WGCNA. Nine transcriptional modules were identified. Module sizes ranged from 147 genes in the magenta module to 891 genes in the turquoise module ( Supplementary Fig. S7). The hierarchical clustering dendrogram of module-eigengenes revealed two meta-modules. In the meta-module I, which encompasses the blue, red, turquoise, magenta, pink, black, and yellow modules, the transcriptional modules are correlated with severe phenotypes, whereas the in the meta-module II, that includes the brown and green modules, the correlation is with mild phenotypes (Supplementary Fig. S8a). The module-trait correlation analysis identified three transcriptional modules with at least one significant correlation (p < 0.05) with disease severity (Severe; Mild) or with the A and B age subgroups www.nature.com/scientificreports/ We also performed a functional characterization of these three transcriptional modules and identified their high hierarchy (HH) genes ( Supplementary Fig. S9). Transcriptional modules often represent biological processes that can be phenotype-specific 18 . The functional enrichment among the genes within a module is widely used for disclosing its biological meaning 18 .
Yellow module. The enrichment analyses for this module (Fig. 2a, Supplementary Table S6) showed that the most represented cellular processes and functional pathways mainly reflect the inflammatory and innate immune responses against SARS-CoV-2 21 . Two terms are worth mentioning: (i) in the GO BP analysis, the most represented category is related to neutrophil activation (44 genes), what is quite expectable due to the role of neutrophils in restricting viral replication and diffusion 22 ; (ii) in the KEGG analysis, the genes of the HIF-1 signaling pathway, that are hyper-expressed in the peripheral blood mononuclear cells (PBMC) of COVID-19 patients 23 and aggravate inflammatory responses 24 .
The yellow module harbors 14 HH genes (Fig. 2b). The Hhubs TMEM59 25 and ATP6V1E1 26 , and the iHubs SPTLC1 27 and SNAPIN 28 are autophagy-related genes, the latter being critical for autophagosome maturation in macrophages. Two Hhubs-PRDX3 29 and SDHB 30 -are involved in the protection against oxidative stress. The eHub TXNDC12 (alias ERP16) takes part in the cellular defense against prolonged ER stress 31 . A decrease in ERstress was shown to occur in a stage-specific manner during neutrophil and macrophage differentiation 32 . The iHub C1GALT1C1 codes for a molecular chaperone (Cosmc) that plays a crucial role TRAIL-induced apoptosis 33 . A well-balanced IFN/TRAIL response is necessary for overcoming viral infections 34 . Finally, it is important to mention that in this module seven HH genes also have high GS values. Four of these genes-TMEM59, PRDX3, SPTLC1, and C1GALT1C1-are positively correlated with the Severe group and with high level of segmented neutrophils. Two other genes-EIF4A3, a modulator of the non-sense mediated mRNA decay pathway 35 and SNAPIN-are positively correlated with the Severe group. TENT2 (alias PAPD4), a gene involved in miRNA processing 36 , is positively correlated with high level of segmented neutrophils. Taken together, the gene enrichment analyses and the functional characterization of the HH genes for the yellow module fairly agree with the positive correlation of this module with COVID-19 severe cases, with subgroup B (aged and severe patients), and with high number of segmented neutrophils and lymphopenia (i.e., elevated NLR and SII; Supplementary Table S5), two hallmarks of COVID-19 severity 37 .
Magenta module. The enrichment analyses revealed a predominance of pathways related with innate and adaptive immune responses and inflammation (Fig. 3a, Supplementary Table S7). Moreover, the Fc epsilon receptor signaling pathway indicates the involvement of basophils and, indeed, it was recently shown that these cells play an active role in the immunity against SARS-CoV-2 38 .
The magenta module has 13 HH genes (Fig. 3b) and most of them act on the modulation of the innate and adaptive antiviral immune responses. The Hhub PPP2CA codifies protein phosphatase 2A, an important regulator of inflammatory signaling 39 . The Hhub RNF114 codes for a RING-type zinc-finger protein that modulates NF-κB activity and T-cell activation 40 . The Hhub HTR3A encodes the subunit A of the 5-HT3 receptor of serotonin, found in monocytes, B and T cells and probably involved in their recruitment to sites of inflammation 41 . www.nature.com/scientificreports/ Decreased serum levels of serotonin were found in severe cases of COVID-19, whereas in the mild cases the serotonin levels were close to those found in control individuals 42 . The Hhubs DHX15 43 and DHX29 44 code for two DEAH-box helicases that are key sensors for antiviral defense against RNA virus infection. The Hhub WASHC5 codes for strumpellin, a protein involved in endosomal fission 45 that was shown to interact directly with SARS-CoV-2 46 . The Hhub CNOT8 encodes the CNOT8 catalytic subunit of the CCR4-NOT complex, which is required for selective mRNA degradation (mRNA surveillance) and for preventing excessive inflammatory responses 47 . The eHub PSMD10 codes for gankyrin and promotes autophagy 48  www.nature.com/scientificreports/ (alias NORE1A), an apoptosis inducer and senescence effector 54 ; and ARL6IP1, that codes for an ER-shaping protein/ involved in the fine control of ER organization 55 .
Black module. The enriched pathways in the black module (Fig. 4a, Supplementary Table S8) reflect some of the main mechanisms involved in the innate immune and inflammatory responses to SARS-CoV-2. Many pathways, such as TLR/4, MyD88/TLR2, CASP8, TRAIL signaling, and necrosis (Fig. 4a) are mechanistically related and take part in the recognition of viral proteins, inflammatory responses, and cell death 56 . Interestingly, the increased expression of TLRs and MyD88 were found to be positively correlated with COVID-19 severity 57 . Noteworthy, the GO BP analysis shows enrichment for I-kappaB Kinase/NF-kappaB signaling pathway, a master regulator of inflammatory and immune responses 58 . This module has 13 HH genes (Fig. 4b) of which five are directly related to immune response regulation. The Hhub CACNA1C codes for the alpha-1 subunit of CaV1.2 calcium channel and is involved in the regulation of human Th2-lymphocyte functions 59 . The eHub RAC1 codifies a Rho GTPase that triggers NF-κB activation 60 . The eHub PTMA encodes prothymosin-alpha, an alarmin involved in cell proliferation, apoptosis, and immune regulation and whose expression in CD8 T memory stem cells is increased in COVID-19 61 . The iHub FOXP4 codifies a transcription factor required for T cell recall response to pathogens 62 . The iHub UCP3 codes for an uncoupling protein (UC) involved in the maintenance of Th17/Treg cell balance 63 , which is skewed towards Th17 in COVID-19 64 .
The other HH genes in the black module are involved in different biological functions, some of them possibly related to modulatory mechanisms acting on inflammatory and immune responses. The Hhub CGYB codes The terms with adjusted p < 0.05 were considered significant. (b) HH genes (Hhubs, iHubs, and eHubs) of this module, which is significantly and positively correlated with Severe group, subgroup A, and with H-SN traits. Each HH gene is identified by its hierarchical categorization, GO biological process or molecular function, and KEGG Pathways-related terms (in bold letters). This module did not present significant Gene Significance (GS) values for any specific trait (p < 0.01). www.nature.com/scientificreports/ for cytoglobin, a protein that protects cells against oxidative stress 65 . The Hhub FHL1 encodes a four-and-ahalf-LIM-domain protein involved in gene transcription regulation, cytoarchitecture, cell proliferation, and signal transduction 66 . Interestingly, FHL1 inhibits the vascular endothelial growth factor (VEGF) expression 67 and elevated serum levels of VEGF were positively correlated with disease severity in COVID-19 68 . The Hhub MISP3 codes for a protein involved in spindle orientation and mitotic progression 69 . The Hhub EVX1 encodes a transcriptional repressor 70 and the Hhub SHISA4 codes for a transmembrane adaptor protein 71 . The iHub DPEP3 encodes dipeptidase 3, involved in the hydrolytic metabolism of dipeptides, and the iHub PYY2 is a pseudogene coding for peptide YY2. Finally, the eHub HBZ codes for zeta hemoglobin and HBZ expression may be interpreted as evidence for circulating erythroid precursors in hospitalized (requiring oxygen) patients 72 , being negatively correlated with hemoglobin levels. HBZ has a negative GS value for the H-SN trait (Fig. 4b).
Differential gene expression analyses. Two comparative gene expression analyses (SAM) were conducted: (i) Severe (n = 11) vs. Mild (n = 12) groups; (ii) A (n = 5) vs. B (n = 6) subgroups. In the Severe vs. Mild comparison 50 differentially expressed genes (DEGs) were found, of which 49 were hyper-expressed in the Severe group, with fold-changes ranging between 8.8 and 2.0. The A vs. B comparison yielded only two DEGs. Subsequently, a normalized expression analysis (using GUSB as endogenous reference) showed that these two genes did not differ significantly between the subgroups A and B. The enrichment analysis for the DEG set of the Severe group showed that these genes are mainly related with neutrophil activation, innate immune response, and inflammation ( Supplementary Fig. S10, Table S9).
A normalized gene expression analysis (using GUSB as an endogenous reference gene) was then performed for the DEGs firstly selected according to KEGG and GO BP enrichment analyses. It was found that 23 genes in the Severe group were significantly (p < 0.005) hyper-expressed (Table 2). Noteworthy, 17 out of the 23 genes hyper-expressed in the Severe group had been previously reported to be hyper-expressed in the PBMC from COVID-19 patients according to the COVID-19 Related Gene Sets, a database in the Enrichr webtool library 19 www.nature.com/scientificreports/ (Table 2). It is also important to mention that 13 out of the 23 genes hyper-expressed in the Severe group are related with neutrophil activation.
DEGs as transcriptional biomarkers. The significant fold-change values and the biological functions of all DEGs found for the Severe group (Table 2, Supplementary Figs. S11, S12) clearly show their potentiality as transcriptional biomarkers. The ascribed roles of these genes-of which 13 are newly identified biomarkers (Table 2)-in the immune response to COVID-19 are addressed in the following paragraphs. Among the seven genes in the GO BP category "neutrophil activation involved in immune response" (Supplementary Fig. S11), three were already identified as COVID-19 biomarkers: CEACAM8 (alias CD66B), that codes for a neutrophil cell-adhesion protein and is highly expressed in patients with severe COVID-19 73 . ARG1, coding for arginase 1 and up-regulated in severe cases of COVID-19 74 ; and LCN2, which encodes lipocalin 2, a marker of neutrophil activation that was classified by machine learning algorithm as one the most potent discriminators of critical illness in COVID-19 75 . One DEG in this category, CAMP, codify for the antimicrobial molecule cathelicidin (LL37), a modulator of TLR activation and inflammation that has been proposed as a potential candidate for COVID-19 prevention and treatment 76 . The remaining three DEGs are related to neutrophil immune functions and metabolism: STOM, that codes for stomatin, a membrane protein associated with azurophilic granules 77 ; GGH, that encodes gamma-glutamyl hydrolase, a lysosomal enzyme involved in the immune response of neutrophils 78 ; and GYGY that codifies glycogenin, an enzyme involved in the synthesis of glycogen 79 .
In the GO BP category "neutrophil activation and positive regulation of cell death" (Supplementary Fig. S6) there are two DEGs. One is HP that codes for haptoglobin, a hemoglobin-binding plasma protein stored and released by neutrophils in response to activation 80 . Haptoglobin plasmatic levels were shown to correlate with disease severity in COVID-19, being reduced in critical patients 81 and elevated in COVID-19 children 82 . The other DEG, HPR, codes for a haptoglobin-related protein that binds hemoglobin was shown to be elevated in the sera of children with bacterial and viral pneumonia 83 . In the GO category "positive regulation of cell death and negative regulation of oxidoreductase activity" there is only one DEG, SCNA, which codes for alpha-synuclein, Table 2. DEGs selected as potential biomarkers for the Severe group and for the subgroup C. Genes in bold are hyper-expressed in the PBMC from COVID-19 patients (COVID-19 Related Gene Sets/Enrichr database). *t-test for relative expression of the DEGs normalized with endogenous reference gene GUSB. **Newly identified as a potential COVID-19 transcriptional biomarker.

Fold-change Relative (p-value)* GO BP or KEGG (in italic)
Severe × Mild www.nature.com/scientificreports/ a neuropeptide expressed in brain and also in mononuclear blood cells 84,85 that was shown to exhibit potent antiviral activity and capacity for signaling the immune system by attracting neutrophils and macrophages and activating dendritic cells 86,87 .
There are three DEGs in the GO BP category "neutrophil activation and negative regulation of apoptosis" (Supplementary Fig. S11). One is LTF, which codifies for lactoferrin, a relevant player in innate immunity with antiviral effects against SARS-CoV-2 and a wide range of viral species 88 . Serum levels of lactoferrin were found to be elevated in severe cases of COVID-19 89 . The other DEG is MPO, a gene that codes for myeloperoxidase, a leukocyte-derived enzyme whose plasmatic levels are elevated in mild to severe cases of COVID-19 and downregulated in patients with very severe disease 90 . The third gene is TXND5, that codes for the thioredoxin domain containing 5 (TXNDC5), an endoplasmic reticulum-resident protein that belongs to the thioredoxin family. The plasmatic levels of this protein are markedly elevated in septic patients, and it has been considered a therapeutic target for attenuating inflammatory responses 91 . Additionally, it was shown that TXND5 is hyper-expressed in B cells after COVID-19 vaccination, being a marker for seroconversion 92 . Therefore, the three genes are potential candidates for monitoring disease severity in COVID-19.
The GO BP category "platelet degranulation and neutrophil activation" has three DEGs ( Supplementary  Fig. S11). One of these genes is SELP, that codes for P-selectin, a platelet cell-adhesion molecule. Increased levels of P-selectin were found in severe cases of COVID-19, what contributes for a prothrombotic state in these patients 93 . The other two genes in this category are ORM1 and ORM2, and both are involved in the encoding of human orosomucoid protein, a major acute-phase plasma protein 94 . Recently, a proteomic analysis of serum from COVID-19 patients showed a significant down-regulation of the ORM1 protein 95 .
The KEGG pathway "p53 signaling and cellular senescence" (Supplementary Fig. S12) includes four DEGs. One of these genes is CDK1 that codes for the cyclin-dependent kinase 1 and is a master regulator of autophagy 96 . CDK1 was shown to be, through bioinformatics and machine learning approaches, a relevant Hub gene in the WBC transcriptome of COVID-19 patients, with high biomarker and therapeutic target potentials 97 . Other two DEGs in this pathway, CCNB1 and CCNB2, coding respectively for cyclin B1 and cyclin B2, are interactors of CDK1 and regulate the mammalian cell cycle 98 . The fourth DEG in this pathway, RRM2, interacts with CDK1 and CNBB1 in the p53 pathway 99 and was recently identified, through bioinformatic analysis, as a key gene in the infection of human intestines by SARS-CoV-2 100 .
Finally, there are three DEGs in the KEGG "Hippo signaling pathway" (Supplementary Fig. S12). One of these genes is BIRC5 that codes for survivin, an inhibitor of apoptosis protein. Survivin is indispensable for the homeostasis of the immune system, being required for innate and adaptive immune responses, differentiation of CD4+ and CD8+ memory T-cells, and for B cell maturation 101 . The other two DEGs are BMP6 and YWHAH. The former codes for the bone morphogenetic protein 6, a regulator of vascular homeostasis and angiogenesis 102 recently identified as an anti-inflammatory cytokine 103 . The latter codes for the 14-3-3η (eta) protein, a phosphoserine/phospho-threonine binding protein that interacts with a wide range of protein targets and participates in multiple cellular biological functions. The 14-3-3η protein is involved in the modulation of antiviral defenses via the RLR signaling pathway. The interaction between 14-3-3η and the melanoma differentiation-associated gene 5 (MDA5) accelerates the activation of the MDA5 signaling, thus helping host cells to mount an effective response against RNA viral infections 104 .
Subnetworks for module genes and DEGs. Subnetworks were constructed for the yellow and black modules. The subnetwork for the yellow module, positively correlated with the Severe group, was constructed using genes enriched for innate immune response and inflammation processes and pathways. The subnetwork for the black module, positively correlated with subgroup A, was constructed using genes enriched for inflammation, innate immune response, and immune response regulation. All these genes and their respective functions are listed in Supplementary Table S10. The DEGs identified in the Severe vs. Mild group comparison were also included in the yellow subnetwork (Supplementary Table S9). Consequently, these subnetworks were constructed with 141 genes for the yellow subnetwork and 33 genes for the black subnetwork. For all subnetworks a gene-gene link cut-off of r >|0.9| was adopted. The yellow subnetwork (Fig. 5) for the Severe group showed a higher value of connectivity (10.6) when compared with the subnetwork obtained for the Mild group (3.2). The subnetworks for the black module (Fig. 6) showed a higher connectivity for subgroup A (11.3) when compared with the subgroup B (6.0). Additionally, the yellow subnetwork contains more interconnected genes in the Severe group (104 genes and 551 links) than in the Mild group (74 genes and 241 links). It is worth to mention that the yellow subnetwork for the Severe group encompasses several genes involved in neutrophil activation (Fig. 5). Similarly, the black subnetwork for subgroup A contains many genes involved in the I-kappaB Kinase/ NF-kappaB signaling, an important modulator of inflammatory and immune responses (Fig. 6).

Discussion
The transcriptomic response of human leukocytes to SARS-CoV-2 infection was investigated by focusing the differences between mild and severe cases and between age subgroups (younger and older adults). The gain of this investigation strategy consists in combining host response transcriptomic data with clinical, demographic, and laboratory information, in a systems biology approach, for studying the immune response between mild and severe COVID-19 1,2,105 . Here, it was possible to functionally characterize the transcriptional modules correlated with age and disease severity, as well as to identify severity-associated DEGs, as discussed below. Still, this study faced two limitations, i.e., the relatively small number of patients included in the transcriptomic study (Table S2), thus impeding a statistical evaluation of the recovery parameter in these patients, and the reduced number of infants presenting severe symptoms, preventing us to investigate the leukocyte transcriptional response to SARS-CoV-2 in that young population. www.nature.com/scientificreports/ Initially, it is worth to recall that most of the patients in the Severe group presented fever, dyspnea, wheezing in the chest, chills, and vomiting, while in the Mild group a sizable number of patients solely presented mild symptoms, such as coryza and headache. It is well known that when the response to respiratory viruses is inadequate the infection spreads to lower respiratory tract (LRT) 106 . An early and effective immune response in patients infected with SARS-CoV-2 can limit and eliminate the infection yet in the upper respiratory tract (URT) 107 . Genomic analyses shown that there is no tissue specific genetic adaptation of SARS-CoV-2 to the URT or LRT 108 .
The transcriptional module differences between the Severe and Mild groups reflect and are coherent with the different presentation of symptoms and with COVID-19 severity. The functional enrichment analysis of the yellow module, associated with the Severe group, revealed that this module is related with inflammation and innate immune response, such as neutrophil activation, apoptosis, necroptosis, and HIF-1 signaling pathway. Expectedly, this module contains seven out of its 14 HH genes highly correlated with disease severity and/or high level of segmented neutrophils. Increased neutrophil count, immature circulating neutrophils, and neutrophil activation transcriptomic sig-natures are typically found in severe cases of COVID-19 37 . Interestingly, the HIF-1 pathway is associated with neutrophil activation, prolonged lifetime, and excessive function in COVID-19 severe patients 24,37 . www.nature.com/scientificreports/ Furthermore, the subnetwork constructed for the yellow module showed that most of the highly interconnected genes in the Severe group are hyper-expressed, and that many of these genes are involved in neutrophil activation, with other genes involved in necroptosis, apoptosis, and autophagy. This high network connectivity indicates that in the Severe group the biological functions and/or cellular processes related to innate immune response and inflammation are more activated than in the Mild group, shedding some light on the underlying transcriptomic mechanism, since the immune response to SARS-CoV-2 is characterized by neutrophil hyperactivation and high neutrophil counts 109 . Most of the patients of the Severe group included in this study presented high NLR as compared with the Mild group.
It is important to highlight that the young (subgroup A) and old patients (subgroup B) in the Severe group also presented differences in their transcriptomic profiles. Two modules-magenta and black-that are correlated with the subgroup A, contains many HH genes related to the innate immune response to viral infections, as well as other HH genes related to modulatory mechanisms acting on these responses. Moreover, in the black subnetwork the highest gene interconnectivity was found for subgroup A, thus indicating that immune modulatory functions are more activated in this subgroup when compared to subgroup B. These transcriptomic mechanisms could be related to the lower median hospital stay that has been observed for young patients (including our cohort) when compared with elderly patients 110 . In the elderly patients a less effective immune modulation of COVID-19, allied to immune senescence and, eventually, to comorbidities, would lead to a worse outcome 111 .
A total of 23 DEGs were found, all hyper-expressed in the Severe group. These genes are potential biomarkers for COVID-19 severity (13 are newly described). Remarkably, thirteen of these DEGs are involved in neutrophil activation, confirming the role of neutrophils in severe COVID-19 37 . Four DEGs are involved in the p53 signaling pathway, that has been associated with lymphopenia in severe COVID-19 patients 3 , and three with the Hippo pathway, recently associated with the antiviral host response in COVID-19 112 . Epidemiological studies show that age remains a relevant predictive factor for COVID-19 severity 111 , together with pre-existing conditions, such as obesity, DM and hypertension 113 . However, the molecular mechanisms determining COVID-19 severity are not yet well understood. Hence, there is a demand for biomarkers derived from comparative transcriptome analyses of mild and severe cases, combined with patients' clinico-demographic and laboratory data. These biomarkers may well be useful for the better stratification of risk factors in COVID-19.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. All microarray raw data have been deposited in GEO public database (http:// www. ncbi. nlm. nih. gov/ geo, token ahwlmsiervcnfid), a MIAME compliant database, under accession number GSE193022. All data will be released with paper publication. Figure 6. Subnetworks for the black module containing ME genes (non-HH module genes), hubs, and DEGs for the subgroups A and B. Red or blue links indicate positive or negative expression correlation values, respectively. Red or green border colors account for hyper-expressed or hypo-expressed genes, respectively. www.nature.com/scientificreports/